Method and device for speech enhancement in the presence of background noise

ABSTRACT

In one aspect thereof the invention provides a method for noise suppression of a speech signal that includes, for a speech signal having a frequency domain representation dividable into a plurality of frequency bins, determining a value of a scaling gain for at least some of said frequency bins and calculating smoothed scaling gain values. Calculating smoothed scaling gain values includes, for the at least some of the frequency bins, combining a currently determined value of the scaling gain and a previously determined value of the smoothed scaling gain. In another aspect a method partitions the plurality of frequency bins into a first set of contiguous frequency bins and a second set of contiguous frequency bins having a boundary frequency there between, where the boundary frequency differentiates between noise suppression techniques, and changes a value of the boundary frequency as a function of the spectral content of the speech signal.

FIELD OF THE INVENTION

The present invention relates to a technique for enhancing speechsignals to improve communication in the presence of background noise. Inparticular but not exclusively, the present invention relates to thedesign of a noise reduction system that reduces the level of backgroundnoise in the speech signal.

BACKGROUND OF THE INVENTION

Reducing the level of background noise is very important in manycommunication systems. For example, mobile phones are used in manyenvironments where high level of background noise is present. Suchenvironments are usage in cars (which is increasingly becominghands-free), or in the street, whereby the communication system needs tooperate in the presence of high levels of car noise or street noise. Inoffice applications, such as video-conferencing and hands-free internetapplications, the system needs to efficiently cope with office noise.Other types of ambient noises can be also experienced in practice. Noisereduction, also known as noise suppression, or speech enhancement,becomes important for these applications, often needed to operate at lowsignal-to-noise ratios (SNR). Noise reduction is also important inautomatic speech recognition systems which are increasingly employed ina variety of real environments. Noise reduction improves the performanceof the speech coding algorithms or the speech recognition algorithmsusually used in above-mentioned applications.

Spectral subtraction is one the mostly used techniques for noisereduction (see S. F. Boll, “Suppression of acoustic noise in speechusing spectral subtraction,” IEEE Trans. Acoust., Speech, SignalProcessing, vol. ASSP-27, pp. 113-120, April 1979). Spectral subtractionattempts to estimate the short-time spectral magnitude of speech bysubtracting a noise estimation from the noisy speech. The phase of thenoisy speech is not processed, based on the assumption that phasedistortion is not perceived by the human ear. In practice, spectralsubtraction is implemented by forming an SNR-based gain function fromthe estimates of the noise spectrum and the noisy speech spectrum. Thisgain function is multiplied by the input spectrum to suppress frequencycomponents with low SNR. The main disadvantage using conventionalspectral subtraction algorithms is the resulting musical residual noiseconsisting of “musical tones” disturbing to the listener as well as thesubsequent signal processing algorithms (such as speech coding). Themusical tones are mainly due to variance in the spectrum estimates. Tosolve this problem, spectral smoothing has been suggested, resulting inreduced variance and resolution. Another known method to reduce themusical tones is to use an over-subtraction factor in combination with aspectral floor (see M. Berouti, R. Schwartz, and J. Makhoul,“Enhancement of speech corrupted by acoustic noise,” in Proc. IEEEICASSP, Washington, D.C., April 1979, pp. 208-211). This method has thedisadvantage of degrading the speech when musical tones are sufficientlyreduced. Other approaches are soft-decision noise suppression filtering(see R. J. McAulay and M. L. Malpass, “Speech enhancement using a softdecision noise suppression filter,” IEEE Trans. Acoust., Speech, SignalProcessing, vol. ASSP-28, pp. 137-145, April 1980) and nonlinearspectral subtraction (see P. Lockwood and J. Boudy, “Experiments with anonlinear spectral subtractor (NSS), hidden Markov models andprojection, for robust recognition in cars,” Speech Commun., vol. 11,pp. 215-228, June 1992).

SUMMARY OF THE INVENTION

In one aspect thereof this invention provides a method for noisesuppression of a speech signal that includes, for a speech signal havinga frequency domain representation dividable into a plurality offrequency bins, determining a value of a scaling gain for at least someof said frequency bins and calculating smoothed scaling gain values.Calculating smoothed scaling gain values comprises, for the at leastsome of the frequency bins, combining a currently determined value ofthe scaling gain and a previously determined value of the smoothedscaling gain.

In another aspect thereof this invention provides a method for noisesuppression of a speech signal that includes, for a speech signal havinga frequency domain representation dividable into a plurality offrequency bins, partitioning the plurality of frequency bins into afirst set of contiguous frequency bins and a second set of contiguousfrequency bins having a boundary frequency there between, where theboundary frequency differentiates between noise suppression techniques,and changing a value of the boundary frequency as a function of thespectral content of the speech signal.

In a further aspect thereof this invention provides a speech encoderthat comprises a noise suppressor for a speech signal having a frequencydomain representation dividable into a plurality of frequency bins. Thenoise suppressor is operable to determine a value of a scaling gain forat least some of the frequency bins and to calculate smoothed scalinggain values for the at least some of the frequency bins by combining acurrently determined value of the scaling gain and a previouslydetermined value of the smoothed scaling gain.

In a still further aspect thereof this invention provides a speechencoder that comprises a noise suppressor for a speech signal having afrequency domain representation dividable into a plurality of frequencybins. The noise suppressor is operable to partition the plurality offrequency bins into a first set of contiguous frequency bins and asecond set of contiguous frequency bins having a boundary frequencythere between. The boundary frequency differentiates between noisesuppression techniques. The noise suppressor is further operable tochange a value of the boundary frequency as a function of the spectralcontent of the speech signal.

In another aspect thereof this invention provides a computer programembodied on a computer readable medium that comprises programinstructions for performing noise suppression of a speech signalcomprising operations of, for a speech signal for a speech signal havinga frequency domain representation dividable into a plurality offrequency bins, determining a value of a scaling gain for at least someof said frequency bins and calculating smoothed scaling gain values,comprising for said at least some of said frequency bins combining acurrently determined value of the scaling gain and a previouslydetermined value of the smoothed scaling gain.

In another aspect thereof this invention provides a computer programembodied on a computer readable medium that comprises programinstructions for performing noise suppression of a speech signalcomprising operations of, for a speech signal for a speech signal havinga frequency domain representation dividable into a plurality offrequency bins, partitioning the plurality of frequency bins into afirst set of contiguous frequency bins and a second set of contiguousfrequency bins having a boundary frequency there between and changing avalue of the boundary frequency as a function of the spectral content ofthe speech signal.

In a still further and certainly non-limiting aspect thereof thisinvention provides a speech encoder that includes means for suppressingnoise in a speech signal having a frequency domain representationdividable into a plurality of frequency bins. The noise suppressingmeans comprises means for partitioning the plurality of frequency binsinto a first set of contiguous frequency bins and a second set ofcontiguous frequency bins having a boundary there between, and forchanging the boundary as a function of the spectral content of thespeech signal. The noise suppressing means further comprises means fordetermining a value of a scaling gain for at least some of the frequencybins and for calculating smoothed scaling gain values for the at leastsome of the frequency bins by combining a currently determined value ofthe scaling gain and a previously determined value of the smoothedscaling gain. Calculating a smoothed scaling gain value preferably usesa smoothing factor having a value determined so that smoothing isstronger for smaller values of scaling gain. The noise suppressing meansfurther comprises means for determining a value of a scaling gain for atleast some frequency bands, where a frequency band comprises at leasttwo frequency bins, and for calculating smoothed frequency band scalinggain values. The noise suppressing means further comprises means forscaling a frequency spectrum of the speech signal using the smoothedscaling gains, where for frequencies less than the boundary the scalingis performed on a per frequency bin basis, and for frequencies above theboundary the scaling is performed on a per frequency band basis.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, advantages and features of the presentinvention will become more apparent upon reading of the followingnon-restrictive description of an illustrative embodiment thereof, givenby way of example only with reference to the accompanying drawings. Inthe appended drawings:

FIG. 1 is a schematic block diagram of speech communication systemincluding noise reduction;

FIG. 2 shown an illustration of windowing in spectral analysis;

FIG. 3 gives an overview of an illustrative embodiment of noisereduction algorithm; and

FIG. 4 is a schematic block diagram of an illustrative embodiment ofclass-specific noise reduction where the reduction algorithm. depends onthe nature of speech frame being processed.

DETAILED DESCRIPTION OF THE ILLUSTRATIVE EMBODIMENTS

In the present specification, efficient techniques for noise reductionare disclosed. The techniques are based at least in part on dividing theamplitude spectrum in critical bands and computing a gain function basedon SNR per critical band similar to the approach used in the EVRC speechcodec (see 3GPP2 C.S0014-0 “Enhanced Variable Rate Codec (EVRC) ServiceOption for Wideband Spread Spectrum Communication Systems”, 3GPP2Technical Specification, December 1999). For example, features aredisclosed which use different processing techniques based on the natureof the speech frame being processed. In unvoiced frames, per bandprocessing is used in the whole spectrum. In frames where voicing isdetected up to a certain frequency, per bin processing is used in thelower portion of the spectrum where voicing is detected and per bandprocessing is used in the remaining bands. In case of background noiseframes, a constant noise floor is removed by using the same scaling gainin the whole spectrum. Further, a technique is disclosed in which thesmoothing of the scaling gain in each band or frequency bin is performedusing a smoothing factor which is inversely related to the actualscaling gain (smoothing is stronger for smaller gains). This approachprevents distortion in high SNR speech segments preceded by low SNRframes, as it is the case for voiced onsets for example.

One non-limiting aspect of this invention is to provide novel methodsfor noise reduction based on spectral subtraction techniques, wherebythe noise reduction method depends on the nature of the speech framebeing processed. For example, in voiced frames, the processing may beperformed on per bin basis below a certain frequency.

In an illustrative embodiment, noise reduction is performed within aspeech encoding system to reduce the level of background noise in thespeech signal before encoding. The disclosed techniques can be deployedwith either narrowband speech signals sampled at 8000 sample/s orwideband speech signals sampled at 16000 sample/s, or at any othersampling frequency. The encoder used in this illustrative embodiment isbased on AMR-WB codec (see S. F. Boll, “Suppression of acoustic noise inspeech using spectral subtraction,” IEEE Trans. Acoust., Speech, SignalProcessing, vol. ASSP-27, pp. 113-120, April 1979), which uses aninternal sampling conversion to convert the signal sampling frequency to12800 sample/s (operating on a 6.4 kHz bandwidth).

Thus the disclose noise reduction technique in this illustrativeembodiment operates on either narrowband or wideband signals aftersampling conversion to 12.8 kHz.

In case of wideband inputs, the input signal has to be decimated from 16kHz to 12.8 kHz. The decimation is performed by first upsampling by 4,then filtering the output through lowpass FIR filter that has the cutoff frequency at 6.4 kHz. Then, the signal is downsampled by 5. Thefiltering delay is 15 samples at 16 kHz sampling frequency.

In case of narrow-band inputs, the signal has to be upsampled from 8 kHzto 12.8 kHz. This is performed by first upsampling by 8, then filteringthe output through lowpass FIR filter that has the cut off frequency at6.4 kHz. Then, the signal is downsampled by 5. The filtering delay is 8samples at 8 kHz sampling frequency.

After the sampling conversion, two preprocessing functions are appliedto the signal prior to the encoding process: high-pass filtering andpre-emphasizing.

The high-pass filter serves as a precaution against undesired lowfrequency components. In this illustrative embodiment, a filter at a cutoff frequency of 50 Hz is used, and it is given by${H_{h1}(z)} = \frac{\text{0.982910156} - {\text{1.965820313}z^{- 1}} + {\text{0.982910156}z^{- 2}}}{1 - {\text{1.965820313}z^{- 1}} + {\text{0.966308593}z^{- 2}}}$

In the pre-emphasis, a first order high-pass filter is used to emphasizehigher frequencies, and it is given byH _(pre-emph)(z)=1−0.68z ⁻¹

Preemphasis is used in AMR-WB codec to improve the codec performance athigh frequencies and improve perceptual weighting in the errorminimization process used in the encoder.

In the rest of this illustrative embodiment the signal at the input ofthe noise reduction algorithm is converted to 12.8 kHz samplingfrequency and preprocessed as described above. However, the disclosedtechniques can be equally applied to signals at other samplingfrequencies such as 8 kHz or 16 kHz with and without preprocessing.

In the following, the noise reduction algorithm will be described indetails. The speech encoder in which the noise reduction algorithm isused operates on 20 ms frames containing 256 samples at 12.8 kHzsampling frequency. Further, the coder uses 13 ms lookahead from thefuture frame in its analysis. The noise reduction follows the sameframing structure. However, some shift can be introduced between theencoder framing and the noise reduction framing to maximize the use ofthe lookahead. In this description, the indices of samples will reflectthe noise reduction framing.

FIG. 1 shows an overview of a speech communication system includingnoise reduction. In block 101, preprocessing is performed as theillustrative example described above.

In block 102, spectral analysis and voice activity detection (VAD) areperformed. Two spectral analysis are performed in each frame using 20 mswindows with 50% overlap. In block 103, noise reduction is applied tothe spectral parameters and then inverse DFT is used to convert theenhanced signal back to the time domain. Overlap-add operation is thenused to reconstruct the signal.

In block 104, linear prediction (LP) analysis and open-loop pitchanalysis are performed (usually as a part of the speech codingalgorithm). In this illustrative embodiment, the parameters resultingfrom block 104 are used in the decision to update the noise estimates inthe critical bands (block 105). The VAD decision can be also used as thenoise update decision. The noise energy estimates updated in block 105are used in the next frame in the noise reduction block 103 to computesthe scaling gains. Block 106 performs speech encoding on the enhancedspeech signal. In other applications, block 106 can be an automaticspeech recognition system. Note that the functions in block 104 can bean integral part of the speech encoding algorithm.

Spectral Analysis

The discrete Fourier Transform is used to perform the spectral analysisand spectrum energy estimation. The frequency analysis is done twice perframe using 256-points Fast Fourier Transform (FFT) with a 50 percentoverlap (as illustrated in FIG. 2). The analysis windows are placed sothat all look ahead is exploited. The beginning of the first window isplaced 24 samples after the beginning of the speech encoder currentframe. The second window is placed 128 samples further. A square root ofa Hanning window (which is equivalent to a sine window) has been used toweight the input signal for the frequency analysis. This window isparticularly well suited for overlap-add methods (thus this particularspectral analysis is used in the noise suppression algorithm based onspectral subtraction and overlap-add analysis/synthesis). The squareroot Hanning window is given by $\begin{matrix}{{{w_{FFT}(n)} = {\sqrt{0.5 - {0.5\quad{\cos\left( \frac{2\pi\quad n}{L_{FFT}} \right)}}} = {\sin\left( \frac{\pi\quad n}{L_{FFT}} \right)}}},{n = 0},\ldots\quad,{L_{FFT} - 1}} & (1)\end{matrix}$where L_(FFT)=256 is the size of FTT analysis. Note that only half thewindow is computed and stored since it is symmetric (from 0 toL_(FFT)/2).

Let s′(n) denote the signal with index 0 corresponding to the firstsample in the noise reduction frame (in this illustrative embodiment, itis 24 samples more than the beginning of the speech encoder frame). Thewindowed signal for both spectral analysis are obtained asx _(w) ⁽¹⁾(n)=w_(FFT)(n)s′(n), n=0, . . . , L _(FFT)−1x _(w) ⁽²⁾(n)=w_(FFT)(n)s′(n+L _(FFT)/2), n=0, . . . , L _(FFT)−1where s′(0) is the first sample in the present noise reduction frame.

FFT is performed on both windowed signals to obtain two sets of spectralparameters per frame:${{X^{(1)}(k)} = {\sum\limits_{n = 0}^{N - 1}{{x_{w}^{(1)}(n)}{\mathbb{e}}^{{- {j2\pi}}\frac{kn}{N}}}}},{k = 0},\ldots\quad,{L_{FFT} - 1}$${{X^{(2)}(k)} = {\sum\limits_{n = 0}^{N - 1}{{x_{w}^{(2)}(n)}{\mathbb{e}}^{{- {j2\pi}}\frac{kn}{N}}}}},{k = 0},\ldots\quad,{L_{FFT} - 1}$

The output of the FFT gives the real and imaginary parts of the spectrumdenoted by X_(R)(k), k=0 to 128, and X_(I)(k), k=1 to 127. Note thatX_(R)(0) corresponds to the spectrum at 0 Hz (DC) and X_(R)(128)corresponds to the spectrum at 6400 Hz. The spectrum at these points isonly real valued and usually ignored in the subsequent analysis.

After FFT analysis, the resulting spectrum is divided into criticalbands using the intervals having the following upper limits (20 bands inthe frequency range 0-6400 Hz):

Critical bands={100.0, 200.0, 300.0, 400.0, 510.0, 630.0, 770.0, 920.0,1080.0, 1270.0, 1480.0, 1720.0, 2000.0, 2320.0, 2700.0, 3150.0, 3700.0,4400.0, 5300.0, 6350.0} Hz.

See D. Johnston, “Transform coding of audio signal using perceptualnoise criteria,” IEEE J. Select. Areas Commun., vol. 6, pp. 314-323,February 1988. The 256-point FFT results in a frequency resolution of 50Hz (6400/128). Thus after ignoring the DC component of the spectrum, thenumber of frequency bins per critical band is M_(CB)={2, 2, 2, 2, 2, 2,3, 3, 3, 4, 4, 5, 6, 6, 8, 9, 11, 14, 18, 21}, respectively.

The average energy in a critical band is computed as $\begin{matrix}{{{E_{CB}(i)} = {\frac{1}{\left( {L_{FFT}/2} \right)^{2}{M_{CB}(i)}}{\sum\limits_{k = 0}^{{M_{CB}{(i)}} - 1}\left( {{X_{R}^{2}\left( {k + j_{i}} \right)} + {X_{I}^{2}\left( {k + j_{i}} \right)}} \right)}}},{i = 0},\ldots\quad,19,} & (2)\end{matrix}$where X_(R)(k) and X_(I)(k) are, respectively, the real and imaginaryparts of the kth frequency bin and j_(i) is the index of the first binin the ith critical band given by j_(i)={1, 3, 5, 7, 9, 11, 13, 16, 19,22, 26, 30, 35, 41, 47, 55, 64, 75, 89, 107}.

The spectral analysis module also computes the energy per frequency bin,E_(BIN)(k), for the first 17 critical bands (74 bins excluding the DCcomponent)E _(BIN)(k)=X _(R) ²(k)+X _(I) ²(k), k=0, . . . , 73   (3)

Finally, the spectral analysis module computes the average total energyfor both FTT analyses in a 20 ms frame by adding the average criticalband energies E_(CB). That is, the spectrum energy for a certainspectral analysis is computed as $\begin{matrix}{E_{frame} = {\sum\limits_{i = 0}^{19}{E_{CB}(i)}}} & (4)\end{matrix}$and the total frame energy is computed as the average of spectrumenergies of both spectral analysis in a frame. That isE _(t)=10log(0.5(E _(frame)(0)+E _(frame)(1)), dB   (5)

The output parameters of the spectral analysis module, that is averageenergy per critical band, the energy per frequency bin, and the totalenergy, are used in VAD, noise reduction, and rate selection modules.

Note that for narrow-band inputs sampled at 8000 sample/s, aftersampling conversion to 12800 sample/s, there is no content at both endsof the spectrum, thus the first lower frequency critical band as well asthe last three high frequency bands are not considered in thecomputation of output parameters (only bands from i=1 to 16 areconsidered).

Voice Activity Detection

The spectral analysis described above is performed twice per frame. LetE_(CB) ⁽¹⁾(i) and E_(CB) ⁽²⁾(i) denote the energy per critical bandinformation for the first and second spectral analysis, respectively (ascomputed in Equation (2)). The average energy per critical band for thewhole frame and part of the previous frame is computed asE _(av)(i)=0.2E _(CB) ⁽⁰⁾(i)+0.4E _(CB) ⁽¹⁾(i)+0.4E _(CB) ⁽²⁾(i)   (6)where E_(CB) ⁽⁰⁾(i) denote the energy per critical band information fromthe second analysis of the previous frame. The signal-to-noise ratio(SNR) per critical band is then computed asSNR _(CB)(i)=E _(av)(i)/N _(CB)(i) bounded by SNR _(CB)≧1.   (7)where N_(CB)(i) is the estimated noise energy per critical band as willbe explained in the next section. The average SNR per frame is thencomputed as $\begin{matrix}{{{SNR}_{av} = {10{\log\left( {\sum\limits_{i = b_{\min}}^{b_{\max}}{{SNR}_{CB}(i)}} \right)}}},} & (8)\end{matrix}$where b_(min)=0 and b_(max)=19 in case of wideband signals, andb_(min)=1 and b_(max)=16 in case of narrowband signals.

The voice activity is detected by comparing the average SNR per frame toa certain threshold which is a function of the long-term SNR. Thelong-term SNR is given bySNR _(LT) ={overscore (E)} _(f) −{overscore (N)} _(f)   (9)where {overscore (E)}_(f) and {overscore (N)}_(f) are computed usingequations (12) and (13), respectively, which will be described later.The initial value of {overscore (E)}_(f) is 45 dB.

The threshold is a piece-wise linear function of the long-term SNR. Twofunctions are used, one for clean speech and one for noisy speech.

For wideband signals, If SNR_(LT)<35 (noisy speech) thenth _(VAD)=0.4346 SNR _(LT)+13.9575else (clean speech)th _(VAD)=1.0333 SNR _(LT)−7

For narrowband signals, If SNR_(LT)<29.6 (noisy speech) thenth _(VAD)=0.313 SNR _(LT)+14.6else (clean speech)th _(VAD)=1.0333 SNR _(LT) −7

Further, a hysteresis in the VAD decision is added to prevent frequentswitching at the end of an active speech period. It is applied in casethe frame is in a soft hangover period or if the last frame is an activespeech frame. The soft hangover period consists of the first 10 framesafter each active speech burst longer than 2 consecutive frames. In caseof noisy speech (SNR_(LT) <35) the hysteresis decreases the VAD decisionthreshold byth_(VAD)=0.95th_(VAD)

In case of clean speech the hysteresis decreases the VAD decisionthreshold byth _(VAD) =th _(VAD)−11

If the average SNR per frame is larger than the VAD decision threshold,that is, if SNR_(av)>th_(VAD), then the frame is declared as an activespeech frame and the VAD flag and a local VAD flag are set to 1.Otherwise the VAD flag and the local VAD flag are set to 0. However, incase of noisy speech, the VAD flag is forced to 1 in hard hangoverframes, i.e. one or two inactive frames following a speech period longerthan 2 consecutive frames (the local VAD flag is then equal to 0 but theVAD flag is forced to 1).

First Level of Noise Estimation and Update

In this section, the total noise energy, relative frame energy, updateof long-term average noise energy and long-term average frame energy,average energy per critical band, and a noise correction factor arecomputed. Further, noise energy initialization and update downwards aregiven.

The total noise energy per frame is given by $\begin{matrix}{N_{tot} = {10{\log\left( {\sum\limits_{i = 0}^{19}{N_{CB}(i)}} \right)}}} & (10)\end{matrix}$where N_(CB)(i) is the estimated noise energy per critical band.

The relative energy of the frame is given by the difference between theframe energy in dB and the long-term average energy. The relative frameenergy is given byE _(rel) =E _(t) −{overscore (E)} _(f)  (11)where E_(t), is given in Equation (5).

The long-term average noise energy or the long-term average frame energyare updated in every frame. In case of active speech frames (VADflag=1), the long-term average frame energy is updated using therelation{overscore (E)} _(f)=0.99{overscore (E)} _(f)+0.01E _(t)   (12)with initial value {overscore (E)}_(f)=45 dB.

In case of inactive speech frames (VAD flag=0), the long-term averagenoise energy is updated by{overscore (N)} _(f)=0.99{overscore (N)} _(f)+0.01N _(tot)   (13)

The initial value of {overscore (N)}_(f) is set equal to N_(tot) for thefirst 4 frames. Further, in the first 4 frames, the value of {overscore(E)}_(f) is bounded by {overscore (E)}_(f)≧{overscore (N)}_(tot)+10.

Frame Energy per Critical Band, Noise Initialization, and Noise UpdateDownward:

The frame energy per critical band for the whole frame is computed byaveraging the energies from both spectral analyses in the frame. Thatis,{overscore (E)} _(CB)(i)=0.5E _(CB) ⁽¹⁾(i)+0.5E _(CB) ⁽²⁾(i)   (14)

The noise energy per critical band N_(CB)(i) is initially initialized to0.03. However, in the first 5 subframes, if the signal energy is not toohigh or if the signal doesn't have strong high frequency components,then the noise energy is initialized using the energy per critical bandso that the noise reduction algorithm can be efficient from the verybeginning of the processing. Two high frequency ratios are computed:r_(15,16) is the ratio between the average energy of critical bands 15and 16 and the average energy in the first 10 bands (mean of bothspectral analyses), and r_(18, 19) is the same but for bands 18 and 19.

In the first 5 frames, if E_(t)<49 and r_(15,16)<2 and r_(18,19)<1.5then for the first 3 frames,N _(CB)(i)={overscore (E)} _(CB)(i), i=0, . . . , 19   (15)and for the following two frames N_(CB)(i) is updated byN _(CB)(i)=0.33N _(CB)(i)+0.66{overscore (E)} _(CB)(i), i=0, . . . , 19  (16)

For the following frames, at this stage, only noise energy updatedownward is performed for the critical bands whereby the energy is lessthan the background noise energy. First, the temporary updated noiseenergy is computed asN _(tmp)(i)=0.9N _(CB)(i)+0.1(0.25E _(CB) ⁽⁰⁾(i)+0.75{overscore (E)}_(CB)(i))   (17)where E_(CB) ⁽⁰⁾(i) correspond to the second spectral analysis fromprevious frame.

Then for i=0 to 19, if N_(tmp)(i)<N_(CB)(i) then N_(CB)(i)=N_(tmp)(i).

A second level of noise update is performed later by settingN_(CB)(i)=N_(tmp)(i) if the frame is declared as inactive frame. Thereason for fragmenting the noise energy update into two parts is thatthe noise update can be executed only during inactive speech frames andall the parameters necessary for the speech activity decision are henceneeded. These parameters are however dependent on LP prediction analysisand open-loop pitch analysis, executed on denoised speech signal. Forthe noise reduction algorithm to have as accurate noise estimate aspossible, the noise estimation update is thus updated downwards beforethe noise reduction execution and upwards later on if the frame isinactive. The noise update downwards is safe and can be doneindependently of the speech activity.

Noise Reduction:

Noise reduction is applied on the signal domain and denoised signal isthen reconstructed using overlap and add. The reduction is performed byscaling the spectrum in each critical band with a scaling gain limitedbetween g_(min) and 1 and derived from the signal-to-noise ratio (SNR)in that critical band. A new feature in the noise suppression is thatfor frequencies lower than a certain frequency related to the signalvoicing, the processing is performed on frequency bin basis and not oncritical band basis. Thus, a scaling gain is applied on every frequencybin derived from the SNR in that bin (the SNR is computed using the binenergy divided by the noise energy of the critical band including thatbin). This new feature allows for preserving the energy at frequenciesnear to harmonics preventing distortion while strongly reducing thenoise between the harmonics. This feature can be exploited only forvoiced signals and, given the frequency resolution of the frequencyanalysis used, for signals with relatively short pitch period. However,these are precisely the signals where the noise between harmonics ismost perceptible.

FIG. 3 shows an overview of the disclosed procedure. In block 301,spectral analysis is performed. Block 302 verifies if the number ofvoiced critical bands is larger than 0. If this is the case then noisereduction is performed in block 304 where per bin processing isperformed in the first voiced K bands and per band processing isperformed in the remaining bands. If K=0 then per band processing isapplied to all the critical bands. After noise reduction on thespectrum, block 305 performs inverse DFT analysis and overlap-addoperation is used to reconstruct the enhanced speech signal as will bedescribed later.

The minimum scaling gain g_(min) is derived from the maximum allowednoise reduction in dB, NR_(max). The maximum allowed reduction has adefault value of 14 dB. Thus minimum scaling gain is given byg_(min)=10^(−NR) ^(max) ^(/20)   (18)and it is equal to 0.19953 for the default value of 14 dB.

In case of inactive frames with VAD=0, the same scaling is applied overthe whole spectrum and is given by g_(s)=0.9g_(min) if noise suppressionis activated (if g_(min) is lower than 1). That is, the scaled real andimaginary components of the spectrum are given byX′ _(R)(k)=g _(s) X _(R)(k), k=1, . . . , 128, and X′ _(I)(k)=g _(s) X_(I)(k), k=1, . . . , 127.   (19)

Note that for narrowband inputs, the upper limits in Equation (19) areset to 79 (up to 3950 Hz).

For active frames, the scaling gain is computed related to the SNR percritical band or per bin for the first voiced bands. If K_(VOIC)>0 thenper bin noise suppression is performed on the first K_(VOIC) bands. Perband noise suppression is used on the rest of the bands. In caseK_(VOIC)=0 per band noise suppression is used on the whole spectrum. Thevalue of K_(VOIC) is updated as will be described later. The maximumvalue of K_(VOIC) is 17, therefore per bin processing can be appliedonly on the first 17 critical bands corresponding to a maximum frequencyof 3700 Hz. The maximum number of bins for which per bin processing canbe used is 74 (the number of bins in the first 17 bands). An exceptionis made for hard hangover frames that will be described later in thissection.

In an alternative implementation, the value of K_(VOIC) may be fixed. Inthis case, in all types of speech frames, per bin processing isperformed up to a certain band and the per band processing is applied tothe other bands.

The scaling gain in a certain critical band, or for a certain frequencybin, is computed as a function of SNR and given by(g _(s))² =k _(s) SNR+c _(s), bounded by g_(min) ≦g _(s)≦1   (20)

The values of k_(s) and c_(s) are determined such as g_(s)=g_(min) forSNR=1, and g_(s)=1 for SNR=45. That is, for SNRs at 1 dB and lower, thescaling is limited to g_(s) and for SNRs at 45 dB and higher, no noisesuppression is performed in the given critical band (g_(s)=1). Thus,given these two end points, the values of k_(s) and c_(s) in Equation(20) are given byk _(s)=(1−g _(min) ²)/44 and c _(s)=(45g _(min) ²−1)/44.   (21)

The variable SNR in Equation (20) is either the SNR per critical band,SNR_(CB)(i), or the SNR per frequency bin, SNR_(BIN)(k), depending onthe type of processing.

The SNR per critical band is computed in case of the first spectralanalysis in the frame as $\begin{matrix}{{{{SNR}_{CB}(i)} = {{\frac{{0.2{E_{CB}^{(0)}(i)}} + {0.6{E_{CB}^{(1)}(i)}} + {0.2{E_{CB}^{(2)}(i)}}}{N_{CB}(i)}\quad i} = 0}},\ldots\quad,19} & (22)\end{matrix}$and for the second spectral analysis, the SNR is computed as$\begin{matrix}{{{{SNR}_{CB}(i)} = {{\frac{{0.4{E_{CB}^{(1)}(i)}} + {0.6{E_{CB}^{(2)}(i)}}}{N_{CB}(i)}\quad i} = 0}},\ldots\quad,19} & (23)\end{matrix}$where E_(CB) ⁽¹⁾(i) and E_(CB) ⁽²⁾(i) denote the energy per criticalband information for the first and second spectral analysis,respectively (as computed in Equation (2)), E_(CB) ⁽⁰⁾(i) denote theenergy per critical band information from the second analysis of theprevious frame, and N_(CB)(i) denote the noise energy estimate percritical band.

The SNR per critical bin in a certain critical band i is computed incase of the first spectral analysis in the frame as $\begin{matrix}{{{{SNR}_{BIN}(k)} = \frac{{0.2{E_{BIN}^{(0)}(k)}} + {0.6{E_{BIN}^{(1)}(k)}} + {0.2{E_{BIN}^{(2)}(k)}}}{N_{CB}(i)}},{k = j_{i}},\ldots\quad,{j_{i} + {M_{CB}(i)} - 1}} & (24)\end{matrix}$and for the second spectral analysis, the SNR is computed as$\begin{matrix}{{{{SNR}_{BIN}(k)} = \frac{{0.4{E_{BIN}^{(1)}(k)}} + {0.6{E_{BIN}^{(2)}(k)}}}{N_{CB}(i)}},{k = j_{i}},\ldots\quad,{j_{i} + {M_{CB}(i)} - 1}} & (25)\end{matrix}$where E_(BIN)⁽¹⁾(k)  and  E_(BIN)⁽²⁾(k)and denote the energy per frequency bin for the first and secondspectral analysis, respectively (as computed in Equation (3)),E_(BIN)⁽⁰⁾(k)denote the energy per frequency bin from the second analysis of theprevious frame, N_(CB)(i) denote the noise energy estimate per criticalband, j_(i) is the index of the first bin in the ith critical band andM_(CB)(i) is the number of bins in critical band i defined in above.

In case of per critical band processing for a band with index i, afterdetermining the scaling gain as in Equation (22), and using SNR asdefined in Equations (24) or (25), the actual scaling is performed usinga smoothed scaling gain updated in every frequency analysis asg _(CB,LP)(i)=α _(gs) g _(CB,LP)(i)+(1−α_(gs))g _(s)   (26)

In this invention, a novel feature is disclosed where the smoothingfactor is adaptive and it is made inversely related to the gain itselfIn this illustrative embodiment the smoothing factor is given byα_(gs)=1−g_(s). That is, the smoothing is stronger for smaller gainsg_(s). This approach prevents distortion in high SNR speech segmentspreceded by low SNR frames, as it is the case for voiced onsets. Forexample in unvoiced speech frames the SNR is low thus a strong scalinggain is used to reduce the noise in the spectrum. If an voiced onsetfollows the unvoiced frame, the SNR becomes higher, and if the gainsmoothing prevents a speedy update of the scaling gain, then it islikely that a strong scaling will be used on the voiced onset which willresult in poor performance. In the proposed approach, the smoothingprocedure is able to quickly adapt and use lower scaling gains on theonset.

The scaling in the critical band is performed asX′ _(R)(k+j _(i))=g _(CB,LP)(i)X _(R)(k+j _(i)), and   (27)X′ _(I)(k+j _(i))=g _(CB,LP)(i)X _(I)(k+j _(i)), k=0, . . . , M_(CB)(i)−1′where j_(i) is the index of the first bin in the critical band i andM_(CB)(i) is the number of bins in that critical band.

In case of per bin processing in a band with index i, after determiningthe scaling gain as in Equation (20), and using SNR as defined inEquations (24) or (25), the actual scaling is performed using a smoothedscaling gain updated in every frequency analysis asg _(BIN,LP)(k)=α _(gs) g _(BIN,LP)(k)+(1−α_(gs))g _(s)   (28)where α_(gs)=1−g_(s) similar to Equation (26).

Temporal smoothing of the gains prevents audible energy oscillationswhile controlling the smoothing using α_(gs) prevents distortion in highSNR speech segments preceded by low SNR frames, as it is the case forvoiced onsets for example.

The scaling in the critical band i is performed asX′ _(R)(k+j _(i))=g _(BIN,LP)(k+j _(i))X _(R)(k+j _(i)), andX′ _(I)(k+j _(i))=g _(BIN,LP)(k+j _(i))X _(I)(k+j _(i)), k=0, . . . , M_(CB)(i)−1′  (29)where j_(i) is the index of the first bin in the critical band i andM_(CB)(i) is the number of bins in that critical band.

The smoothed scaling gains g_(BIN,LP)(k) and g_(CB,LP)(i) are initiallyset to 1. Each time an inactive frame is processed (VAD=0), the smoothedgains values are reset to g_(min) defined in Equation (18).

As mentioned above, if K_(VOIC)>0 per bin noise suppression is performedon the first K_(VOIC) bands, and per band noise suppression is performedon the remaining bands using the procedures described above. Note thatin every spectral analysis, the smoothed scaling gains g_(CB,LP)(i) areupdated for all critical bands (even for voiced bands processed with perbin processing—in this case g_(CB,LP)(i) is updated with an average ofg_(BIN,LP)(k) belonging to the band i). Similarly, scaling gainsg_(BIN,LP)(k) are updated for all frequency bins in the first 17 bands(up to bin 74). For bands processed with per band processing they areupdated by setting them equal to g_(CB,LP)(i) in these 17 specificbands.

Note that in case of clean speech, noise suppression is not performed inactive speech frames (VAD=1). This is detected by finding the maximumnoise energy in all critical bands, max(N_(CB)(i)), i=0, . . . , 19, andif this value is less or equal 15 then no noise suppression isperformed.

As mentioned above, for inactive frames (VAD=0), a scaling of 0.9g_(min) is applied on the whole spectrum, which is equivalent toremoving a constant noise floor. For VAD short-hangover frames (VAD=1and local_VAD=0), per band processing is applied to the first 10 bandsas described above (corresponding to 1700 Hz), and for the rest of thespectrum, a constant noise floor is subtracted by scaling the rest ofthe spectrum by a constant value g_(min). This measure reducessignificantly high frequency noise energy oscillations. For these bandsabove the 10^(th) band, the smoothed scaling gains g_(CB,LP)(i) are notreset but updated using Equation (26) with g_(s)=g_(min) and the per binsmoothed scaling gains g_(BIN,LP)(k) are updated by setting them equalto g_(CB,LP)(i) in the corresponding critical bands.

The procedure described above can be seen as a class-specific noisereduction where the reduction algorithm depends on the nature of speechframe being processed. This is illustrated in FIG. 4. Block 401 verifiesif the VAD flag is 0 (inactive speech). If this is the case then aconstant noise floor is removed from the spectrum by applying the samescaling gain on the whole spectrum (block 402). Otherwise, block 403verifies if the frame is VAD hangover frame. If this is the case thenper band processing is used in the first 10 bands and the same scalinggain is used in the remaining bands (block 406). Otherwise, block 405verifies if voicing is detected in the first bands in the spectrum. Ifthis is the case then per bin processing is performed in the first Kvoiced bands and per band processing is performed in the remaining bands(block 406). If no voiced bands are detected then per band processing isperformed in all critical bands (block 407).

In case of processing of narrowband signals (upsampled to 12800 Hz), thenoised suppression is performed on the first 17 bands (up to 3700 Hz).For the remaining 5 frequency bins between 3700 Hz and 4000 Hz, thespectrum is scaled using the last scaling gain g_(s) at the bin at 3700Hz. For the remaining of the spectrum (from 4000 Hz to 6400 Hz), thespectrum is zeroed.

Reconstruction of Denoised Signal:

After determining the scaled spectral components, X′_(R)(k) andX′_(I)(k), inverse FFT is applied on the scaled spectrum to obtain thewindowed denoised signal in the time domain.${{x_{w,d}(n)} = {\frac{1}{N}{\sum\limits_{k = 0}^{N - 1}{{X(k)}{\mathbb{e}}^{{j2\pi}\frac{kn}{N}}}}}},{n = 0},\ldots\quad,{L_{FFT} - 1}$

This is repeated for both spectral analysis in the frame to obtain thedenoised windowed signals x_(w, d)⁽¹⁾(n)  and  x_(w, d)⁽²⁾(n).For every half frame, the signal is reconstructed using an overlap-addoperation for the overlapping portions of the analysis. Since a squareroot Hanning window is used on the original signal prior to spectralanalysis, the same window is applied at the output of the inverse FFTprior to overlap-add operation. Thus, the doubled windowed denoisedsignal is given by $\begin{matrix}{{{{x_{{ww},d}^{(1)}(n)} = {{w_{FFT}(n)}{x_{w,d}^{(1)}(n)}}},{n = 0},\ldots\quad,{L_{FFT} - 1}}{{{x_{{ww},d}^{(2)}(n)} = {{w_{FFT}(n)}{x_{w,d}^{(2)}(n)}}},{n = 0},\ldots\quad,{L_{FFT} - 1}}} & (30)\end{matrix}$

For the first half of the analysis window, the overlap-add operation forconstructing the denoised signal is performed ass(n) = x_(ww, d)⁽⁰⁾(n + L_(FFT)/2) + x_(ww, d)⁽¹⁾(n), n = 0, …  , L_(FFT)/2 − 1and for the second half of the analysis window, the overlap-addoperation for constructing the denoised signal is performed ass(n + L_(FFT)/2) = x_(ww, d)⁽¹⁾(n + L_(FFT)/2) + x_(ww, d)⁽²⁾(n), n = 0, …  , L_(FFT)/2 − 1where x_(ww, d)⁽⁰⁾(n)is the double windowed denoised signal from the second analysis in theprevious frame.

Note that with overlap-add operation, since there a 24 sample shiftbetween the speech encoder frame and noise reduction frame, the denoisedsignal can be reconstructed up to 24 sampled from the lookahead inaddition to the present frame. However, another 128 samples are stillneeded to complete the lookahead needed by the speech encoder for linearprediction (LP) analysis and open-loop pitch analysis. This part istemporary obtained by inverse windowing the second half of the denoisedwindowed signal x_(w, d)⁽²⁾(n)without performing overlap-add operation. That iss(n + L_(FFT)) = x_(ww, d)⁽²⁾(n + L_(FFT)/2)w_(FFT)²(n + L_(FFT)/2), n = 0, …  , L_(FFT)/2 − 1

Note that this portion of the signal is properly recomputed in the nextframe using overlap-add operation.

Noise Energy Estimates Update

This module updates the noise energy estimates per critical band fornoise suppression. The update is performed during inactive speechperiods. However, the VAD decision performed above, which is based onthe SNR per critical band, is not used for determining whether the noiseenergy estimates are updated. Another decision is performed based onother parameters independent of the SNR per critical band. Theparameters used for the noise update decision are: pitch stability,signal non-stationarity, voicing, and ratio between 2nd order and16^(th) order LP residual error energies and have generally lowsensitivity to the noise level variations.

The reason for not using the encoder VAD decision for noise update is tomake the noise estimation robust to rapidly changing noise levels. Ifthe encoder VAD decision were used for the noise update, a suddenincrease in noise level would cause an increase of SNR even for inactivespeech frames, preventing the noise estimator to update, which in turnwould maintain the SNR high in following frames, and so on.Consequently, the noise update would be blocked and some other logicwould be needed to resume the noise adaptation.

In this illustrative embodiment, open-loop pitch analysis is performedat the encoder to compute three open-loop pitch estimates per frame:d₀,d₁, and d₂, corresponding to the first half-frame, second half-frame,and the lookahead, respectively. The pitch stability counter is computedaspc=|d ₀ −d ⁻¹ |+|d ₁ −d ₀ |+|d ₂ −d ₁|  (31)where d⁻¹ is the lag of the second half-frame of the pervious frame. Inthis illustrative embodiment, for pitch lags larger than 122, theopen-loop pitch search module sets d₂=d₁. Thus, for such lags the valueof pc in equation (31) is multiplied by {fraction (3/2)} to compensatefor the missing third term in the equation. The pitch stability is trueif the value of pc is less than 12. Further, for frames with lowvoicing, pc is set to 12 to indicate pitch instability. That isIf C _(norm)(d ₀)+C _(norm)(d ₁)+C _(norm)(d ₂))/3+r _(e)<0.7 thenpc=12,   (32)where C_(norm)(d) is the normalized raw correlation and r_(e) is anoptional correction added to the normalized correlation in order tocompensate for the decrease of normalized correlation in the presence ofbackground noise. In this illustrative embodiment, the normalizedcorrelation is computed based on the decimated weighted speech signals_(wd)(n) and given by${{C_{norm}(d)} = \frac{\sum\limits_{n = 0}^{L_{\sec}}\quad{{s_{wd}(n)}{s_{wd}\left( {n - d} \right)}}}{\sqrt{\sum\limits_{n = 0}^{L_{\sec}}\quad{{s_{wd}^{2}(n)}{\sum\limits_{n = 0}^{L_{\sec}}{s_{wd}^{2}\left( {n - d} \right)}}}}}},$

where the summation limit depends on the delay itself. In thisillustrative embodiment, the weighted signal used in open-loop pitchanalysis is decimated by 2 and the summation limits are given accordingto L_(sec) = 40 for d = 10, . . . , 16 L_(sec) = 40 for d = 17, . . . ,31 L_(sec) = 62 for d = 32, . . . , 61 L_(sec) = 115 for d = 62, . . . ,115

The signal non-stationarity estimation is performed based on the productof the ratios between the energy per critical band and the average longterm energy per critical band.

The average long term energy per critical band is updated byE _(CB,LT)(i)=α _(e) E _(CB,LT)(i)+(1−α_(e)){overscore (E)} _(CB)(i),for i=b _(min) to b _(max,)   (33)where b_(min)=0 and b_(max)=19 in case of wideband signals, andb_(min)=1 and b_(max)=16 in case of narrowband signals, and {overscore(E)}_(CB)(i) is the frame energy per critical band defined in Equation(14). The update factor α_(e) is a linear function of the total frameenergy, defined in Equation (5), and it is given as follows:

For wideband signals: α_(e) =0.0245E_(tot)−0.235 bounded by 0.5≦α_(e)≦0.99.

For narrowband signals: α_(e)=0.00091E_(tot)+0.3185 bounded by 0.5≦α_(e)≦0.999.

The frame non-stationarity is given by the product of the ratios betweenthe frame energy and average long term energy per critical band. That is$\begin{matrix}{{nonstat} = {\prod\limits_{i = b_{\min}}^{b_{\max}}\frac{\max\left( {{{\overset{\_}{E}}_{CB}(i)},{E_{{CB},{LT}}(i)}} \right)}{\min\left( {{{\overset{\_}{E}}_{CB}(i)},{E_{{CB},{LT}}(i)}} \right)}}} & (34)\end{matrix}$

The voicing factor for noise update is given byvoicing=(C _(norm) (d ₀)+C _(norm)(d ₁))/2+r _(e.)   (35)

Finally, the ratio between the LP residual energy after 2^(nd) order and16^(th) order analysis is given byresid_ratio =E(2)/E(16)   (36)where E(2) and E(16) are the LP residual energies after 2^(nd) order and16^(th) order analysis, and computed in the Levinson-Durbin recursion ofwell known to people skilled in the art. This ratio reflects the factthat to represent a signal spectral envelope, a higher order of LP isgenerally needed for speech signal than for noise. In other words, thedifference between E(2) and E(16) is supposed to be lower for noise thanfor active speech.

The update decision is determined based on a variable noise_update whichis initially set to 6 and it is decreased by 1 if an inactive frame isdetected and incremented by 2 if an active frame is detected. Further,noise_update is bounded by 0 and 6. The noise energies are updated onlywhen noise_update=0.

The value of the variable noise_update is updated in each frame asfollows:If (nonstat>th_(stat)) OR (pc<12) OR (voicing>0.85) OR(resid_ratio>th_(resid))noise_update=noise_update+2Elsenoise_update=noise_update−1where for wideband signals, th_(stat)=350000 and th_(resid)=1.9, and fornarrowband signals, th_(stat)=500000 and th_(resid)=11.

In other words, frames are declared inactive for noise update when(nonstat≦th_(stat)) AND (pc≧12) AND (voicing≦0.85) AND(resid_ratio≦th_(resid))and a hangover of 6 frames is used before noise update takes place.

Thus, if noise_update=0 thenfor i=0 to 19 N_(CB)(i)=N_(tmp)(i)where N_(tmp)(i) is the temporary updated noise energy already computedin Equation (17).Update of Voicing Cutoff Frequency:

The cut-off frequency below which a signal is considered voiced isupdated. This frequency is used to determine the number of criticalbands for which noise suppression is performed using per bin processing.

First, a voicing measure is computed asν _(g)=0.4C _(norm)(d ₁)+0.6C _(norm)(d ₂)+r _(e)   (37)and the voicing cut-off frequency is given byf_(c)=0.00017118e^(17.9772ν) ^(g) bounded by 325 ≦f _(c)≦3700   (38)

Then, the number of critical bands, K_(voic), having an upper frequencynot exceeding f_(c) is determined. The bounds of 325≦f_(c)≦3700 are setsuch that per bin processing is performed on a minimum of 3 bands and amaximum of 17 bands (refer to the critical bands upper limits definedabove). Note that in the voicing measure calculation, more weight isgiven to the normalized correlation of the lookahead since thedetermined number of voiced bands will be used in the next frame.

Thus, in the following frame, for the first K_(voic) critical bands, thenoise suppression will use per bin processing as described in above.

Note that for frames with low voicing and for large pitch delays, onlyper critical band processing is used and thus K_(voic) is set to 0. Thefollowing condition is used:If (0.4C_(norm) (d₁)+0.6C_(norm)(d₂)≦0.72) OR (d₁>116) OR (d₂>116) thenK_(voic)=0.

Of course, many other modifications and variations are possible. In viewof the above detailed illustrative description of embodiments of thisinvention and associated drawings, such other modifications andvariations will now become apparent to those of ordinary skill in theart. It should also be apparent that such other variations may beeffected without departing from the spirit and scope of the presentinvention.

1. A method for noise suppression of a speech signal, comprising: for aspeech signal having a frequency domain representation dividable into aplurality of frequency bins, determining a value of a scaling gain forat least some of said frequency bins; and calculating smoothed scalinggain values, comprising for said at least some of said frequency binscombining a currently determined value of the scaling gain and apreviously determined value of the smoothed scaling gain.
 2. A method asin claim 1, where determining the value of the scaling gain comprisesusing a signal-to-noise ratio (SNR).
 3. A method as in claim 1, wherecalculating a smoothed scaling gain value uses a smoothing factor havinga value that is inversely related to the scaling gain.
 4. A method as inclaim 1, where calculating a smoothed scaling gain uses a smoothingfactor having a value determined so that smoothing is stronger forsmaller values of scaling gain.
 5. A method as in claim 1, furthercomprising: determining a value of a scaling gain for at least somefrequency bands, where a frequency band comprises at least two frequencybins; and calculating smoothed frequency band scaling gain values,comprising for said at least some of said frequency bands combining acurrently determined value of the scaling gain and a previouslydetermined value of the smoothed frequency band scaling gain.
 6. Amethod as in claim 1, where determining the value of the scaling gainoccurs n times per speech frame, where n is greater than one.
 7. Amethod as in claim 6, where n=2.
 8. A method as in claim 5, furthercomprising scaling a frequency spectrum of the speech signal usingsmoothed scaling gains, where for frequencies less than a certainfrequency the scaling is performed on a per frequency bin basis, and forfrequencies above the certain frequency the scaling is performed on aper frequency band basis.
 9. A method as in claim 8, where a value ofthe certain frequency is variable and is a function of the speechsignal.
 10. A method as in claim 8, where a value of the certainfrequency in a current speech frame is a function of the speech signalin a previous speech frame.
 11. A method as in claim 8, wheredetermining the value of the scaling gain occurs n times per speechframe, where n is greater than one, and where a value of the certainfrequency is variable and is a function of the speech signal.
 12. Amethod as in claim 8, where determining the value of the scaling gainoccurs n times per speech frame, where n is greater than one, and wherea value of the certain frequency is variable and is at least partially afunction of the speech signal in a previous speech frame.
 13. A methodas in claim 1, where scaling the frequency spectrum of the speech signalusing smoothed scaling gains on the per frequency bin basis is performedon a maximum of 74 bins corresponding to 17 bands.
 14. A method as inclaim 1, where scaling the frequency spectrum of the speech signal usingsmoothed scaling gains on the per frequency bin basis is performed on amaximum number of frequency bins corresponding to a frequency of 3700Hz.
 15. A method as in claim 2, where for a first SNR value the value ofthe scaling gain is set to a minimum value, and for a second SNR valuegreater than the first SNR value the value of the scaling gain is set tounity.
 16. A method as in claim 15, where the first SNR value is equalto about 1 dB, and where the second SNR value is about 45 dB.
 17. Amethod as in claim 1, further comprising, in response to an occurrenceof an inactive speech frame, resetting the plurality of smoothed scalinggain values to a minimum value.
 18. A method as in claim 1, where noisesuppression is not performed in an active speech frame where a maximumnoise energy, in a plurality of frequency bands, is below a thresholdvalue, where each frequency band comprises at least two frequency bins.19. A method as in claim 1, further comprising, in response to anoccurrence of a short-hangover speech frame, scaling the frequencyspectrum of the speech signal using smoothed scaling gains determined ona per frequency band basis for a first x frequency bands, where eachfrequency band comprises at least two frequency bins, and scalingremaining frequency bands of the frequency spectrum of the speech signalusing a single value of the scaling gain that is updated n times perspeech frame, where n is greater than one.
 20. A method as in claim 19,where the first x frequency bands correspond to a frequency up to 1700Hz.
 21. A method as in claim 1, where for a narrowband speech signal themethod further comprises scaling the frequency spectrum of the speechsignal using smoothed scaling gains determined on a per frequency bandbasis for a first x frequency bands, where each frequency band comprisesat least two frequency bins and the first x frequency bands correspondto a frequency up to 3700 Hz, scaling the frequency spectrum of thefrequency bins between 3700 Hz and 4000 Hz using the value of thescaling gain at the frequency bin corresponding to 3700 Hz, and zeroingthe remaining frequency bands of the frequency spectrum of the speechsignal.
 22. A method as in claim 21, where the narrowband speech signalis one that is upsampled to 12800 Hz.
 23. A method as in claim 1,comprising preprocessing the speech signal.
 24. A method as in claim 23,where preprocessing comprises high pass filtering and pre-emphasizing.25. A method as in claim 8, where the certain frequency is related to avoicing cut-off frequency, further comprising determining the voicingcut-off frequency using a computed voicing measure.
 26. A method as inclaim 25, further comprising determining a number of critical bandshaving an upper frequency that does not exceed the voicing cut-offfrequency, where bounds are set such that per frequency bin processingis performed on a minimum of x bands and a maximum of y bands, whereeach frequency band comprises at least two frequency bins.
 27. A methodas in claim 26, where x=3 and where y=17.
 28. A method as in claim 25,where the voicing cut-off frequency is bounded so as to be equal to orgreater than 325 Hz and equal to or less than 3700 Hz.
 29. A method asin claim 26, where a decision whether to update noise energy estimatesper critical band during inactive speech periods is based on parameterssubstantially independent of a signal-to-noise ratio (SNR) per criticalband.
 30. A method for noise suppression of a speech signal, comprising:for a speech signal having a frequency domain representation dividableinto a plurality of frequency bins, partitioning the plurality offrequency bins into a first set of contiguous frequency bins and asecond set of contiguous frequency bins having a boundary frequencythere between, said boundary frequency differentiating between noisesuppression techniques; and changing a value of the boundary frequencyas a function of the spectral content of the speech signal.
 31. A methodas in claim 30, further comprising scaling a frequency spectrum of thespeech signal using smoothed scaling gains, where for frequencies lessthan the boundary frequency the scaling is performed on a per frequencybin basis, and for frequencies above the boundary frequency the scalingis performed on a per frequency band basis, where a frequency bandcomprises at least two frequency bins.
 32. A method as in claim 30,where the noise suppression techniques comprise per frequency bin andper frequency band techniques, where a frequency band comprises at leasttwo frequency bins.
 33. A method as in claim 30, where the value of theboundary frequency in a current speech frame is at least partially afunction of the speech signal in a previous speech frame.
 34. A methodas in claim 31, further comprising: determining a value of a scalinggain for at least some of said frequency bins; and calculating smoothedscaling gain values, comprising for said at least some of said frequencybins combining a currently determined value of the scaling gain and apreviously determined value of the smoothed scaling gain.
 35. A methodas in claim 31, where scaling the frequency spectrum of the speechsignal on the per frequency bin basis is performed on a maximum of 74bins corresponding to 17 bands.
 36. A method as in claim 31, wherescaling the frequency spectrum of the speech signal on the per frequencybin basis is performed on a maximum number of frequency binscorresponding to a boundary frequency of 3700 Hz.
 37. A method as inclaim 34, where determining a value of a scaling gain comprises using asignal-to-noise ratio (SNR).
 38. A method as in claim 37, where for afirst SNR value the value of the scaling gain is set to a minimum value,and for a second SNR value greater than the first SNR value the value ofthe scaling gain is set to unity.
 39. A method as in claim 38, where thefirst SNR value is equal to about 1 dB, and where the second SNR valueis about 45 dB.
 40. A method as in claim 34, where calculating asmoothed scaling gain value uses a smoothing factor having a value thatis inversely related to the scaling gain.
 41. A method as in claim 34,further comprising, in response to an occurrence of an inactive speechframe, resetting smoothed scaling gain values to a minimum value.
 42. Amethod as in claim 30, where noise suppression is not performed in anactive speech frame where a maximum noise energy, in a plurality offrequency bands, is below a threshold value, where a frequency bandcomprises at least two frequency bins.
 43. A method as in claim 31,further comprising, in response to an occurrence of a short-hangoverspeech frame, scaling the frequency spectrum of the speech signal usingsmoothed scaling gains determined on a per band basis for a first xfrequency bands, and scaling remaining frequency bands of the frequencyspectrum of the speech signal using a single value of the scaling gainthat is updated n times per speech frame, where n is greater than one.44. A method as in claim 43, where the first x frequency bandscorrespond to a frequency up to 1700 Hz.
 45. A method as in claim 30,where for a narrowband speech signal the method further comprisesscaling the frequency spectrum of the speech signal using smoothedscaling gains determined on a per frequency band basis for a first xfrequency bands, where each frequency band comprises at least twofrequency bins and the first x frequency bands correspond to a frequencyup to 3700 Hz, scaling the frequency spectrum of the frequency binsbetween 3700 Hz and 4000 Hz using the value of the scaling gain at thefrequency bin corresponding to 3700 Hz, and zeroing the remainingfrequency bands of the frequency spectrum of the speech signal.
 46. Amethod as in claim 45, where the narrowband speech signal is one that isupsampled to 12800 Hz.
 47. A method as in claim 30, comprisingpreprocessing the speech signal.
 48. A method as in claim 47, wherepreprocessing comprises high pass filtering and pre-emphasizing.
 49. Amethod as in claim 34, where determining the value of the scaling gainoccurs n times per speech frame, where n is greater than one.
 50. Amethod as in claim 49, where n=2.
 51. A method as in claim 30, where thevalue of the boundary frequency is a function of a voicing cut-offfrequency, further comprising determining the voicing cut-off frequencyusing a computed voicing measure.
 52. A method as in claim 51, furthercomprising determining a number of critical bands having an upperfrequency that does not exceed the voicing cut-off frequency, wherebounds are set such that per frequency bin processing is performed on aminimum of x bands and a maximum of y bands.
 53. A method as in claim52, where x=3 and where y=17.
 54. A method as in claim 51, where thevoicing cut-off frequency is bounded so as to be equal to or greaterthan 325 Hz and equal to or less than 3700 Hz.
 55. A method as in claim52, where a decision whether to update noise energy estimates percritical band during inactive speech periods is based on parameterssubstantially independent of a signal-to-noise ratio (SNR) per criticalband.
 56. A speech encoder, comprising a noise suppressor for a speechsignal having a frequency domain representation dividable into aplurality of frequency bins, said noise suppressor operable to determinea value of a scaling gain for at least some of said frequency bins andto calculate smoothed scaling gain values for said at least some of saidfrequency bins by combining a currently determined value of the scalinggain and a previously determined value of the smoothed scaling gain. 57.A speech encoder as in claim 56, where said noise suppressor uses asignal-to-noise ratio (SNR) when determining the value of the scalinggain.
 58. A speech encoder as in claim 56, where calculating a smoothedscaling gain value uses a smoothing factor having a value that isinversely related to the scaling gain.
 59. A speech encoder as in claim56, where calculating a smoothed scaling gain uses a smoothing factorhaving a value determined so that smoothing is stronger for smallervalues of scaling gain.
 60. A speech encoder as in claim 56, said noisesuppressor further operable to determine a value of a scaling gain forat least some frequency bands, where a frequency band comprises at leasttwo frequency bins and to calculate smoothed frequency band scaling gainvalues, comprising for said at least some of said frequency bands, bycombining a currently determined value of the scaling gain and apreviously determined value of the smoothed frequency band scaling gain.61. A speech encoder as in claim 56, where determining the value of thescaling gain occurs n times per speech frame, where n is greater thanone.
 62. A speech encoder as in claim 61, where n=2.
 63. A speechencoder as in claim 60, said noise suppressor further comprising ascaling unit to scale a frequency spectrum of the speech signal usingsmoothed scaling gains on one of the per frequency bin basis or the perfrequency band basis, where for frequencies less than a certainfrequency the scaling is performed on the per frequency bin basis, andfor frequencies above the certain frequency the scaling is performed onthe per frequency band basis.
 64. A speech encoder as in claim 63, wherea value of the certain frequency is variable and is a function of thespeech signal.
 65. A speech encoder as in claim 63, where a value of thecertain frequency in a current speech frame is at least partially afunction of the speech signal in a previous speech frame.
 66. A speechencoder as in claim 63, where said noise suppressor determines the valueof the scaling gain n times per speech frame, where n is greater thanone, and where a value of the certain frequency is variable and is atleast partially a function of the speech signal in a previous speechframe.
 67. A speech encoder as in claim 56, where said noise suppressorscales the frequency spectrum of the speech signal using smoothedscaling gains on the per frequency bin basis on a maximum of 74 binscorresponding to 17 bands.
 68. A speech encoder as in claim 56, wheresaid noise suppressor scales the frequency spectrum of the speech signalusing smoothed scaling gains on the per frequency bin basis on a maximumnumber of frequency bins corresponding to a frequency of 3700 Hz.
 69. Aspeech encoder as in claim 57, where for a first SNR value the value ofthe scaling gain is set to a minimum value, and for a second SNR valuegreater than the first SNR value the value of the scaling gain is set tounity.
 70. A speech encoder as in claim 69, where the first SNR value isequal to about 1 dB, and where the second SNR value is about 45 dB. 71.A speech encoder as in claim 56, where said noise suppressor isresponsive to an occurrence of an inactive speech frame to reset theplurality of smoothed scaling gain values to a minimum value.
 72. Aspeech encoder as in claim 56, where said noise suppressor does notsuppress noise in an active speech frame where a maximum noise energy,in a plurality of frequency bands, is below a threshold value.
 73. Aspeech encoder as in claim 56, said noise suppressor is responsive to anoccurrence of a short-hangover speech frame to scale the frequencyspectrum of the speech signal using smoothed scaling gains determined ona per band basis for a first x frequency bands, where each frequencyband comprises at least two frequency bins, and to scale remainingfrequency bands of the frequency spectrum of the speech signal using asingle value of the scaling gain that is updated n times per speechframe, where n is greater than one.
 74. A speech encoder as in claim 73,where the first x frequency bands correspond to a frequency up to 1700Hz.
 75. A speech encoder as in claim 56, where said noise suppressor isresponsive to a narrowband speech signal to scale the frequency spectrumof the speech signal using smoothed scaling gains determined on a perband basis for a first x frequency bands, where each frequency bandcomprises at least two frequency bins and the first x frequency bandscorrespond to a frequency up to 3700 Hz, to scale the frequency spectrumof the frequency bins between 3700 Hz and 4000 Hz using the value of thescaling gain at the frequency bin corresponding to 3700 Hz, and to zerothe remaining frequency bands of the frequency spectrum of the speechsignal.
 76. A speech encoder as in claim 75, where the narrowband speechsignal is one that is upsampled to 12800 Hz.
 77. A speech encoder as inclaim 56, further at least one preprocessor for preprocessing an inputspeech signal prior to application of the speech signal to said noisesuppressor.
 78. A speech encoder as in claim 77, where said at least onepreprocessor comprises a high pass filter and a pre-emphasizer.
 79. Aspeech encoder as in claim 63, where the certain frequency is related toa voicing cut-off frequency that is determined using a computed voicingmeasure.
 80. A speech encoder as in claim 79, where said noisesuppressor determines a number of critical bands having an upperfrequency that does not exceed the voicing cut-off frequency, wherebounds are set such that per frequency bin processing is performed on aminimum of x bands and a maximum of y bands.
 81. A speech encoder as inclaim 80, where x=3 and where y=17.
 82. A speech encoder as in claim 80,where the voicing cut-off frequency is bounded so as to be equal to orgreater than 325 Hz and equal to or less than 3700 Hz.
 83. A speechencoder as in claim 80, where said noise suppressor makes a decisionwhether to update noise energy estimates per critical band duringinactive speech periods based on parameters substantially independent ofa signal-to-noise ratio (SNR) per critical band.
 84. A speech encoder,comprising a noise suppressor for a speech signal having a frequencydomain representation dividable into a plurality of frequency bins, saidnoise suppressor operable to partition the plurality of frequency binsinto a first set of contiguous frequency bins and a second set ofcontiguous frequency bins having a boundary frequency there between,said boundary frequency differentiating between noise suppressiontechniques, said noise suppressor further operable to change a value ofthe boundary frequency as a function of the spectral content of thespeech signal.
 85. A speech encoder as in claim 84, where said noisesuppressor further comprises a scaler to scale a frequency spectrum ofthe speech signal using smoothed scaling gains, where for frequenciesless than the boundary frequency the scaling is performed on a perfrequency bin basis, and for frequencies above the boundary frequencythe scaling is performed on a per frequency band basis, where afrequency band comprises at least two frequency bins.
 86. A speechencoder as in claim 84, where the noise suppression techniques compriseper frequency bin and per frequency band techniques, where a frequencyband comprises at least two frequency bins.
 87. A speech encoder as inclaim 84, where the value of the boundary frequency in a current speechframe is at least partially a function of the speech signal in aprevious speech frame.
 88. A speech encoder as in claim 85, where saidnoise suppressor further comprises a unit to determine a value of ascaling gain for individual ones of said frequency bands and tocalculate smoothed scaling gain values, and for at least some of saidfrequency bands to combine a currently determined value of the scalinggain and a previously determined value of the smoothed scaling gain;where determining the value of a scaling gain occurs n times per speechframe, where n is greater than one, and where the value of the boundaryfrequency is at least partially a function of the speech signal in aprevious speech frame.
 89. A speech encoder as in claim 85, where saidscaler uses smoothed scaling gains on the per frequency bin basis on amaximum of 74 bins corresponding to 17 bands.
 90. A speech encoder as inclaim 85, where said scaler uses smoothed scaling gains on the perfrequency bin basis on a maximum number of frequency bins correspondingto a boundary frequency of 3700 Hz.
 91. A speech encoder as in claim 85,where a value of the scaling gain is determined using a signal-to-noiseratio (SNR).
 92. A speech encoder as in claim 86, where a value of thesmoothing factor is inversely related to the scaling gain.
 93. A speechencoder as in claim 92, where for a first SNR value the value of thescaling gain is set to a minimum value, and for a second SNR valuegreater than the first SNR value the value of the scaling gain is set tounity.
 94. A speech encoder as in claim 93, where the first SNR value isequal to about 1 dB, and where the second SNR value is about 45 dB. 95.A speech encoder as in claim 85, where said noise suppressor isresponsive to an occurrence of an inactive speech frame to resetsmoothed scaling gain values to a minimum value.
 96. A speech encoder asin claim 84, where noise suppression is not performed in an activespeech frame where a maximum noise energy, in a plurality of frequencybands, is below a threshold value, where a frequency band comprises atleast two frequency bins.
 97. A speech encoder as in claim 85, wheresaid noise suppressor is responsive to an occurrence of a short-hangoverspeech frame to scale the frequency spectrum of the speech signal usingsmoothed scaling gains determined on a per band basis for a first xfrequency bands, and to scale remaining frequency bands of the frequencyspectrum of the speech signal using a single value of the scaling gainthat is updated n times per speech frame, where n is greater than one.98. A speech encoder as in claim 97, where the first x frequency bandscorrespond to a frequency up to 1700 Hz.
 99. A speech encoder as inclaim 85, where said noise suppressor is responsive to a presence of anarrowband speech signal to scale the frequency spectrum of the speechsignal using smoothed scaling gains determined on a per band basis for afirst x frequency bands, where the first x frequency bands correspond toa frequency up to 3700 Hz, to scale the frequency spectrum of thefrequency bins between 3700 Hz and 4000 Hz using the value of thescaling gain at the frequency bin corresponding to 3700 Hz, and to zerothe remaining frequency bands of the frequency spectrum of the speechsignal.
 100. A speech encoder as in claim 99, where the narrowbandspeech signal is one that is upsampled to 12800 Hz.
 101. A speechencoder as in claim 84, further at least one preprocessor forpreprocessing an input speech signal prior to application of the speechsignal to said noise suppressor.
 102. A speech encoder as in claim 101,where said at least one preprocessor comprises a high pass filter and apre-emphasizer.
 103. A speech encoder as in claim 84, where the value ofthe boundary frequency is a function of a voicing cut-off frequency thatis determined using a computed voicing measure.
 104. A speech encoder asin claim 103, where said noise suppressor determines a number ofcritical bands having an upper frequency that does not exceed thevoicing cut-off frequency, where bounds are set such that per frequencybin processing is performed on a minimum of x bands and a maximum of ybands.
 105. A speech encoder as in claim 104, where x=3 and where y=17.106. A speech encoder as in claim 104, where the voicing cut-offfrequency is bounded so as to be equal to or greater than 325 Hz andequal to or less than 3700 Hz.
 107. A speech encoder as in claim 104,where said noise suppressor makes a decision whether to update noiseenergy estimates per critical band during inactive speech periods basedon parameters substantially independent of a signal-to-noise ratio (SNR)per critical band.
 108. A speech encoder, comprising means forsuppressing noise in a speech signal having a frequency domainrepresentation dividable into a plurality of frequency bins, said noisesuppressing means comprising means for partitioning the plurality offrequency bins into a first set of contiguous frequency bins and asecond set of contiguous frequency bins having a boundary there between,and for changing the boundary as a function of the spectral content ofthe speech signal, said noise suppressing means further comprising meansfor determining a value of a scaling gain for at least some of saidfrequency bins and for calculating smoothed scaling gain values for saidat least some of said frequency bins by combining a currently determinedvalue of the scaling gain and a previously determined value of thesmoothed scaling gain, where calculating a smoothed scaling gain valueuses a smoothing factor having a value determined so that smoothing isstronger for smaller values of scaling gain, said noise suppressingmeans further comprising means for determining a value of a scaling gainfor at least some frequency bands, where a frequency band comprises atleast two frequency bins, and for calculating smoothed frequency bandscaling gain values, said noise suppressing means further comprisingmeans for scaling a frequency spectrum of the speech signal using thesmoothed scaling gains, where for frequencies less than the boundary thescaling is performed on a per frequency bin basis, and for frequenciesabove the boundary the scaling is performed on a per frequency bandbasis.
 109. A speech encoder as in claim 108, where the boundarycomprises a frequency that is a function of a voicing cut-off frequencythat is determined using a computed voicing measure, where said noisesuppressing means determines a number of critical bands having an upperfrequency that does not exceed the voicing cut-off frequency, wherebounds are set such that per frequency bin processing is performed on aminimum of x bands and a maximum of y bands, where x=3 and where y=17,and where the voicing cut-off frequency is bounded so as to be equal toor greater than 325 Hz and equal to or less than 3700 Hz.
 110. Acomputer program embodied on a computer readable medium, comprisingprogram instructions for performing noise suppression of a speechsignal, comprising operations of, for a speech signal for a speechsignal having a frequency domain representation dividable into aplurality of frequency bins, determining a value of a scaling gain forat least some of said frequency bins and calculating smoothed scalinggain values, comprising for said at least some of said frequency binscombining a currently determined value of the scaling gain and apreviously determined value of the smoothed scaling gain.
 111. Acomputer program as in claim 110, the operations further comprisingdetermining a value of a scaling gain for at least some frequency bands,where a frequency band comprises at least two frequency bins andcalculating smoothed frequency band scaling gain values, comprising forsaid at least some of said frequency bands combining a currentlydetermined value of the scaling gain and a previously determined valueof the smoothed frequency band scaling gain.
 112. A computer program asin claim 111, the operations further comprising scaling a frequencyspectrum of the speech signal using smoothed scaling gains, where forfrequencies less than a certain frequency the scaling is performed on aper frequency bin basis, and for frequencies above the certain frequencythe scaling is performed on a per frequency band basis.
 113. A computerprogram as in claim 112, where a value of the certain frequency isvariable and is a function of the speech signal.
 114. A computer programas in claim 112, where the certain frequency is related to a voicingcut-off frequency, further comprising an operation of determining thevoicing cut-off frequency using a computed voicing measure.
 115. Acomputer program as in claim 114, further comprising an operation ofdetermining a number of critical bands having an upper frequency thatdoes not exceed the voicing cut-off frequency, where bounds are set suchthat per frequency bin processing is performed on a minimum of threebands and a maximum of seventeen bands.
 116. A computer program as inclaim 114, where the voicing cut-off frequency is bounded so as to beequal to or greater than about 325 Hz and equal to or less than about3700 Hz.
 117. A computer program as in claim 114, where a decisionwhether to update noise energy estimates per critical band duringinactive speech periods is based on parameters substantially independentof a signal-to-noise ratio (SNR) per critical band.
 118. A computerprogram embodied on a computer readable medium, comprising programinstructions for performing noise suppression of a speech signal,comprising operations of, for a speech signal having a frequency domainrepresentation dividable into a plurality of frequency bins,partitioning the plurality of frequency bins into a first set ofcontiguous frequency bins and a second set of contiguous frequency binshaving a boundary frequency there between and changing a value of theboundary frequency as a function of the spectral content of the speechsignal.
 119. A computer program as in claim 118, the operations furthercomprising scaling a frequency spectrum of the speech signal usingsmoothed scaling gains, where for frequencies less than the boundaryfrequency the scaling is performed on a per frequency bin basis, and forfrequencies above the boundary frequency the scaling is performed on aper frequency band basis, where a frequency band comprises at least twofrequency bins.
 120. A computer program as in claim 118, where the valueof the boundary frequency in a current speech frame is at leastpartially a function of the speech signal in a previous speech frame.121. A computer program as in claim 119, the operations furthercomprising determining a value of a scaling gain for individual ones ofsaid frequency bands and calculating smoothed scaling gain values,comprising for at least some of said frequency bands, an operation ofcombining a currently determined value of the scaling gain and apreviously determined value of the smoothed scaling gain, wheredetermining the value of a scaling gain occurs n times per speech frame,where n is greater than one, and where a value of the boundary frequencyis a function of the speech signal in a previous speech frame.
 122. Acomputer program as in claim 118, where the boundary frequency isrelated to a voicing cut-off frequency, further comprising an operationof determining the voicing cut-off frequency using a computed voicingmeasure.
 123. A computer program as in claim 122, further comprising anoperation of determining a number of critical bands having an upperfrequency that does not exceed the voicing cut-off frequency, wherebounds are set such that per frequency bin processing is performed on aminimum of three bands and a maximum of seventeen bands.
 124. A computerprogram as in claim 122, where the voicing cut-off frequency is boundedso as to be equal to or greater than about 325 Hz and equal to or lessthan about 3700 Hz.
 125. A computer program as in claim 122, where adecision whether to update noise energy estimates per critical bandduring inactive speech periods is based on parameters substantiallyindependent of a signal-to-noise ratio (SNR) per critical band.